Using String Vector based KNN for Keyword Extraction

نویسنده

  • Taeho Jo
چکیده

In this research, we propose the string vector based KNN as the approach to the keyword extraction. The keyword extraction may be viewed as an instance of word classification, encoding words into numerical vectors may cause the main problems, such as the huge dimensionality, the sparse distribution and the poor transparency, and the problems were solved by encoding texts into string vectors in previous works on text mining tasks. In this research by these motivations, we encode words into string vectors, define the semantic operation on string vectors, and modify the K Nearest neighbor into its string vector based version which is used for the keyword extraction. As the benefits from this research, we expect the better performance and more compact representations than encoding words or texts into numerical vectors. Hence, the goal of this research is to implement the keyword extraction system with the benefits.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Keyword Extraction from Documents Using Conditional Random Fields

Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, most algorithms and systems aimed t...

متن کامل

Summarization of Web Pages by Keyword Extraction and Sentence Vector

In this paper we are trying to propose a system that can run in parallel with the usual search engine to provide the user with unified and summarized information. Our system will relieve the user of manual accessing of each of the web links that is produced by the search result of a search engine. To implement such feature in the search process, here we propose a procedure that can identify the...

متن کامل

String Vector based KNN for Index Optimization

In this research, we propose the string vector based KNN as the approach to the index optimization. The task may be viewed into an instance of word classification and the problems in encoding words or texts into numerical vectors were solved by encoding texts into string vectors in the previous works on text mining tasks. Influence by the previous works, we encode words into string vectors, as ...

متن کامل

CBIR Processing Approach on Colored and Texture Images using KNN Classifier and Log-Gabor Respectively

Content Based Image Retrieval (CBIR), also called as Query By Image Content (QBIC). Content Based Image Retrieval is the method to retrieve stored image from database by supplying query image instead of text. This is achieved using proper feature extraction and matching process. Here we have implemented two methods of content based image retrieval using color and texture. In feature extraction ...

متن کامل

To Enhance A-KNN Clustering Algorithm for Improving Software Architecture

Software Architecture is important factor for the development of complex and big software system. Software Architecture Decomposition is an important part in software design. Software clustering is used to cluster functions of similar type in one cluster and other are in other cluster. Kmean is the base of the clustering but it has some limitations. Many clustering methods are used for decompos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016